Additional studies of the probability that the events with a 
superjet observed by CDF are consistent with the SM prediction 
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>< \ In the W+ 2,3 jet data collected by CDF during the 1992-1995 Fermilab 

V ' 

Qh! collider run, 13 events were observed to contain a superjet when 4.4 it 0.6 



events are expected. A previous article detailed the selection and the kine- 



^ \ matical properties of these events. The present paper provides estimates of 

H ; 

5r , tbe probability that the kinematics of these 13 events is statistically consis- 

tent with the standard model prediction. 
PACS number(s): 13.85.Qk, 13.38.Be, 13.20.He 



I. INTRODUCTION 

The CDF experiment has reported an excess of events in the W + 2 and + 3 
jet topologies in which the presumed heavy-flavor jet contains a soft lepton (SLT tag) in 
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addition to a secondary vertex (SECVTX tag)[]. The rate of these events (13 observed) 

is larger than what is predicted by a simulation of known standard model (SM) processes 

(4.4 ± 0.6 events expected, including single and pair production of top quarks). Various 

kinematical distributions of these events are compared in Ref. [jl| to what is expected if the 

excess were simply due to a statistical fluctuation of the SM contributions. The simulation 

is cross-checked by comparing to a complementary sample of 42 W + 2 and + 3 jet events 

with SECVTX tags but no supertags. According to the simulation [H], events with a superjet 

and the complementary data set have quite similar heavy flavor composition. A set of 18 

kinematical variables was chosen a priori to look for differences between data and simulation. 

Each data distribution is compared to the SM expectation using a Kolmogorov-Smirnov (K- 

S) test 0,0]. The probability P that each distribution is consistent with the SM simulation 

is derived with Monte Carlo pseudo-experiments which include Poisson fluctuations and 

Gaussian uncertainties in the prediction of each standard model contribution. 

In Ref. [0], a subset of 9 kinematical variables is selected a posteriori to illustrate the 

main differences between the data and the simulation: Elp and r]\ the transverse energy 

and pseudo-rapidity of the primary lepton (/); E^-' and 7]^^\ the transverse energy and 

pseudo-rapidity of the superjet (suj); and rf', the transverse energy and pseudo-rapidity 

of the additional jets (6) in the event; E'^^'*'*"-' and the transverse energy and 

rapidity of the system I + h + suj; and 50''^+*"-^, the azimuthal angle between between the 

primary lepton and the system b + suj composed by the superjet and the other jets in the 

events. The first 8 variables test if the production cross sections — of each object in 

dprdri 

the final state is consistent with the SM simulation and the ninth variable tests if the data 
are consistent with the production and decay of W bosons from known sources. Table | 
summarizes the probabilities of these comparisons. The SM simulation models correctly the 
complementary sample of data, but has a systematically low probability of being consistent 



Such a double tag is called supertag in Ref. |]l|; jets with a supertag are referred to as superjets. 
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with the kinematic distributions of the events with a superjet. The use of this subset of 
variables is well motivated by the fact that it provides a simple way to describe in full the 
kinematics of the final state with relatively modest correlations. However, it is not the only 
possible choice. 

Table |n| lists the result of the K-S test of the other 9 kinematical distributions inspected: 
EIt, the corrected transverse missing energy; M^f , the W transverse mass calculated using 
the primary lepton and E!t; M^+'''^ , y^+^^J', and the invariant mass, rapidity, and 

transverse energy of the system b + suj respectively; M'"'"*"*"*"-' , the invariant mass of the 
system l + b + suj; 66'^''^^^ and 6(f)'''^'^\ the angle and the azimuthal angle between the superjet 
and the 6-jets, respectively; and S6'''^~^^^^ , the angle between the primary lepton and the 
system b + suj. The simulation models correctly these distributions for the complementary 
sample. The probabilities for the events with a superjet are systematically lower, but the 
disagreement between data and simulation is much reduced for this second set of variables. 
This second set of 9 distributions would have been better suited to find differences if, for 
example, events with a superjet were produced by the two-body decay of a massive object 
produced in association with a W boson or by the three-body decay of a massive object 
produced in association with large !^t- 

In Sect. H, we first evaluate the combined probability that the data are statistically 
consistent with the simulation using different methods in order to estimate the effect of 
possible correlations between kinematic variables. We then study the effect of the bias 
introduced by the choice of particular sets of kinematical variables which were not motivated 



by a specific model or by the analysis of an independent data sample. Section |T| summarizes 
our conclusions. 
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TABLE I. Results of the K-S comparison between data and simulation for the first set of 9 

kinematical variables. P is the probability of making an observation with a K-S distance no 

smaller than that of the data. 

Events with a superjet Complementary sample 



Variable 


P (%) 


P (%) 


1 


2.6 


70.9 




0.10 


72.7 




11.1 


43.0 




15.2 


73.4 


E^ 


6.7 


8.6 




6.8 


80.0 


^l+b+suj 


2.5 


18.8 


yl+b+suj 


13.8 


7.8 




1.0 


77.9 



TABLE II. Results of the K-S comparison between data and simulation for the second set of 9 
kinematical variables. 



Events with a superjet Complementary sample 



Variable 


P{%) 


P(%) 




27.1 


57.1 




13.1 


38.2 




4.0 


58.9 


yb+suj 


7.1 


34.9 




24.0 


60.1 


^l+b+suj 


21.0 


33.6 


§Qb,suj 


30.1 


41.1 


S(j)b,suj 


15.3 


83.8 


^0l,b+suj 


37.3 


35.7 
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II. EVALUATION OF THE COMBINED PROBABILITY 



Using the results of the previous section, we first evaluate the combined probability that 
the data are statistically consistent with the simulation using the set of 9 kinematical vari- 
ables listed in Table |I[ The combined probability is evaluated with three different approaches 
in order to test the sensitivity of the result to the correlations between kinematical variables. 

n 

In the simplest method, we evaluate the probability of observing a value of 11 = J^-Pj, 

i 

where n is the number of kinematic variables, no larger than that of the data (II''). If the 
kinematical variables are uncorrelated, this probability is TIt = — |^. This 

A;=0 

method yields 11^ = 0.46 for the complementary sample and 11^ = 1.6 x 10^^ for events 
with a superjet. 

In the second method, which accounts for the effect of correlations between variables, 
we perform a large number of Monte Carlo pseudo-experiments. In each experiment, we 
form a set of 8 W+ 2 jet and 5 W+ 3 jet different events randomly extracted from the 
simulations of the 12 processes listed in Tables V and VI of Ref. [0]. In each experiment, we 
first randomly determine Ni, the number of events contributed by each process separately 
for the 2 and 3 jet bin. This is done using as probabilities the ratios o"j/cr, where the 
contribution ai of each process i (as listed in Tables V and VI of Ref. is smeared, in each 

12 

experiment, by its error using a Gaussian distribution and a = ^cxj. We then randomly 

1=1 

extract Ni events from the simulation of each process i to form a sample of 13 events (8 with 
2 jets and 5 with 3 jets). We compare the distribution of the nine kinematical variables to 
the SM templates by using the same K-S test of Ref. ^ and derive the product 11 of the 
probabilities Pi for each experiment. The combined probability that the data are consistent 
with the SM simulation is given by He, the fraction of pseudo-experiments which have a 
probability 11 no larger than 11°. The distribution of the probability product 11 resulting 
from 10^ pseudo-experiments which use simulated events is shown in Figure |I|. We find 
16 pseudo-experiments with a product of probabilities no larger than that observed for the 
superjet data. This corresponds to a combined probability He = (1.6 ± 0.4) x 10^^ (4.8 a 
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effect). 

We have also performed pseudo-experiments in which we compare the SM simulation to 
13 different events extracted randomly from the complementary sample of data consisting 
of 42 events. For each experiment, we compare the kinematical distributions of each sample 
to the SM templates and derive the product of probabilities 11. Figure ^ shows the 11 distri- 
bution of the 10^ pseudo-experiments. The probability that 13 events randomly extracted 
from the control sample have a product 11 no larger than the data is (1.4 ± 0.4) x 10~^. In 
other words, it is very hard to find, among these particular 42 events, a subsample of 13 
events that disagrees with the SM simulation as much as the superjet sample. 

We have studied a few effects which might influence the low value of the combined 
probability. 

As observed in Section VD of Ref. [0], the rapidity distributions of the objects in the 
final state are quite asymmetric. Since we know of no physics process that would produce 
such asymmetries, it is possible that they are due to an obscure detector problem, not seen 
in other data samples, or to a low probability statistical fluctuation. Therefore, it is of 
interest to understand the effect of these asymmetries on the low value of the combined 
probability. We have done this by comparing the 9 observed and simulated distributions 
using the pseudo-rapidity absolute values. This test also yields a small value of the combined 
probability (IIt = 4.5 x 10~^). 

The combined probability value depends on the estimate of the contribution of each SM 
process and its uncertainty. We have studied the effect of varying the fraction of tt events. 
If we make the hypothesis that the data are contributed only by tt events, Ut grows to 
1.2 X 10^^ for the events with a superjet and decreases to 0.8 x 10^^ for the complementary 
sample. 

We next study the bias due to the use of a particular set of kinematical variables which, 
while quite reasonable and well motivated, was not chosen a priori. For example, we could 
have evaluated the combined probability using a slightly different set of 8 kinematic variables: 
E!j^, r]', E^-' , T]^"^^ , E^, 1]'^, FSt, and Mjf . This set does not describe the kinematics of all 
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objects in the final state as completely as the previous one, but it contains some variables 
which are more intuitive. In this case, we derive the following combined probabilities: 
Ut = 7.4 X 10-^ and Uc = (2.5 ± 0.5) x lO^^ (4.2 a effect). 

Events with a superjet are not very anomalous when using the set of 9 kinematical 
variables listed in Table 0. Using this set of variables, the combined probabilities for events 
with a superjet are IIt = 1.9 x 10^^ and Uc = 2.3 x 10"^, respectively. 

The effect of the bias due to the a posteriori choice of a particular set of kinematical vari- 
ables is removed by evaluating the combined probability for all the 18 kinematical variables 
inspected. In this case, the probability that the data are consistent with the simulation is 
Ut = 0.67 for the complementary sample and 11^ = 6.0 x 10"'' for events with a superjet. 
This estimate of the combined probability does not account for the effect of large correlations 
between a few of the 18 kinematical variables. With 10^ pseudo-experiments which use simu- 
lated events we evaluate that in this case the combined probability for events with a superjet 
is Uc = (3.4 ± 0.6) X 10^^. The 11 distribution resulting from these pseudo-experiments is 
shown in Figure ^. 
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FIG. 1. Distribution of the product 11 of 9 probabilities obtained with 10 pseudo-experiments 
which use 13 events randomly extracted from the SM simulation (see text). The arrow indicates 
the n value of the data. The inset shows the 11 distribution in full. 
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FIG. 2. Distribution of the product 11 of 9 probabilities of 13 events extracted randomly from 
the complementary sample of 42 events. The arrow indicates the 11 value of the data. 
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FIG. 3. Distribution of the product 11 of 18 probabilities obtained with 10^ pseudo-experiments 
which use 13 events randomly extracted from the SM simulation (see text). The arrow indicates 
the n value of the data. 
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III. CONCLUSIONS 



Having taken into account the correlations between kinematical variables, we estimate 
that the combined probability that the events with a superjet, reported by the CDF collab- 
oration in Ref. |l|], are statistically consistent with the SM simulation is (1.6 ± 0.4) x 10"^ 
(4.8 a effect). This probability is derived using a particular set of 9 kinematical variables, 
selected a posteriori from a larger set of 18, which was chosen a priori in order to search 
for differences between data and simulation. The effect of the bias due to the a posteriori 
selection of particular sets of variables cannot be univocally assessed. We have therefore eval- 
uated the combined probability that these events are consistent with the simulation using 
all kinematical variables which have been inspected. We find that the combined probability 
remains low [(3.4 ±0.6) x 10~^ (4.1 a effect)]. 
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